Disaster Recovery¶

Last updated: 2026-04-10

Scenario 1 — LXC is broken, snapshot revert works¶

Symptoms: Service is down, container won't start, config is corrupted.

Recovery time: ~2 minutes

# From Proxmox host
pct stop 101                    # or 102
pct rollback 101 phase1-complete
pct start 101

# Verify
pct enter 101
cd /opt/edge-gateway
docker compose ps               # All services should be Up

Note

After rollback, any changes made since the snapshot are lost. This includes n8n workflows created after the snapshot, NPM proxy host changes, and firewall rule edits.

Scenario 2 — LXC is broken, no usable snapshot¶

Symptoms: Snapshot is also corrupted, or snapshot was never taken.

Recovery time: ~30 minutes per container

Rebuild edge-gateway (CT 101)¶

Delete the broken container:
```
pct stop 101
pct destroy 101
```
Recreate from scratch following Phase 1 Implementation Guide.md Steps 3.1–3.4
Restore config files:
/opt/edge-gateway/docker-compose.yml — copy from Implementation Guide Step 3.4
/opt/edge-gateway/.env — retrieve TUNNEL_TOKEN from password manager or Cloudflare dashboard (Zero Trust → Tunnels → exzentcg-homelab → Configure → copy token)
NPM proxy host config is stored in /opt/edge-gateway/npm/data/ — if this was backed up, restore it. If not, recreate the proxy host for n8n.exzentcg.com manually (Step 8)

Fix DNS:

cat > /etc/resolv.conf <<'EOF'
nameserver 1.1.1.1
nameserver 1.0.0.1
EOF
chattr +i /etc/resolv.conf

Start services and verify:

cd /opt/edge-gateway
docker compose up -d
docker compose logs --tail 30 cloudflared
# Look for "Registered tunnel connection" lines

Recreate firewall rules — copy /etc/pve/firewall/101.fw from Phase 1 Actions.md Step 5.1

Rebuild n8n-app (CT 102)¶

Delete and recreate the container following Implementation Guide Steps 4.1–4.4
Restore config files:
/opt/n8n/docker-compose.yml — copy from Implementation Guide Step 4.4
/opt/n8n/.env — retrieve N8N_ENCRYPTION_KEY from password manager
Restore n8n data:
If /opt/n8n/data/ was backed up, restore it and chown -R 1000:1000 ./data
If not backed up, n8n starts fresh — all workflows, credentials, and the owner account are lost. You must redo the setup wizard.

Fix permissions and start:

cd /opt/n8n
chown -R 1000:1000 ./data
docker compose up -d

Recreate firewall rules — copy /etc/pve/firewall/102.fw from Phase 1 Actions.md Step 5.2

Scenario 3 — Proxmox host dies completely¶

Symptoms: Hardware failure, disk corruption, total loss.

Recovery time: ~2 hours

What you need: - A new machine (or repaired hardware) - Proxmox VE ISO (download from proxmox.com) - This Obsidian vault (stored on your laptop, not on the Proxmox host) - Access to your password manager

Steps:

Install Proxmox VE fresh on the new hardware
Set the management IP to 192.168.0.200 (or update all references)
Install Tailscale: curl -fsSL https://tailscale.com/install.sh | sh && tailscale up
Recreate Datacenter firewall IP sets (Step 1 of Implementation Guide)
Enable Datacenter firewall (Step 2)
Create node firewall rules (Step 0.2)
Recreate CT 101 edge-gateway (Steps 3.1–3.4)
Recreate CT 102 n8n-app (Steps 4.1–4.4)
Apply container firewall rules (Step 5)
Start cloudflared with stored tunnel token (Step 7.2–7.3)
Recreate NPM proxy host (Step 8)
Verify end-to-end: https://n8n.exzentcg.com

Warning

Cloudflare-side config (tunnel, Access policies, DNS) survives a host death. You do NOT need to recreate the tunnel, DNS records, or Access applications. Only the on-premises infrastructure needs rebuilding.

Scenario 4 — N8N_ENCRYPTION_KEY lost¶

Symptoms: n8n starts but all credential nodes show errors. Workflows that use stored API keys/tokens fail.

Recovery: There is no recovery. The key is AES-256 — without it, the encrypted credential blobs in n8n's SQLite database are unreadable.

Mitigation: 1. Re-enter every credential manually in n8n 2. Re-test every workflow that uses credentials 3. Generate a new encryption key and update .env:

openssl rand -hex 32
nano /opt/n8n/.env   # replace old key with new
docker compose restart n8n

4. This time, back up the key in two places

Scenario 5 — Cloudflare Tunnel token compromised¶

Symptoms: Someone has your tunnel token and could potentially route traffic through your tunnel.

Recovery:

Go to Cloudflare Zero Trust → Networks → Tunnels → exzentcg-homelab
Rotate the tunnel token (or delete and recreate the tunnel)
Copy the new token

Update on edge-gateway:

pct enter 101
cd /opt/edge-gateway
nano .env   # replace TUNNEL_TOKEN value
docker compose restart cloudflared
docker compose logs --tail 30 cloudflared
# Verify "Registered tunnel connection" appears

If you recreated the tunnel, you also need to re-add the public hostname route and update the DNS CNAME

Backup Strategy (recommended)¶

What	How	Frequency
Proxmox LXC snapshots	`pct snapshot <id> <name>`	After any significant change
n8n data directory	`tar czf /root/n8n-backup-$(date +%F).tgz /opt/n8n/data/` from CT 102	Weekly or before n8n updates
NPM data directory	`tar czf /root/npm-backup-$(date +%F).tgz /opt/edge-gateway/npm/` from CT 101	After proxy host changes
This Obsidian vault	Git repo or cloud sync (OneDrive, etc.)	Continuous
Password manager	Cloud-synced (Bitwarden, 1Password)	Continuous